智能论文笔记

EDAssistant: Supporting Exploratory Data Analysis in Computational Notebooks with In-Situ Code Search and Recommendation

Xingjun Li , Yizhi Zhang , Justin Leung , Chengnian Sun , Jian Zhao

分类：机器学习

2021-12-15

使用计算笔记本（例如，Jupyter Notebook），数据科学家根据他们的先前经验和外部知识（如在线示例）合理化他们的探索性数据分析（EDA）。对于缺乏关于数据集或问题的具体了解的新手或数据科学家，有效地获得和理解外部信息对于执行EDA至关重要。本文介绍了eDassistant，一个jupyterlab扩展，支持EDA的原位搜索示例笔记本电脑和有用的API的推荐，由搜索结果的新颖交互式可视化供电。代码搜索和推荐是由最先进的机器学习模型启用的，培训在线收集的EDA笔记本电脑的大型语料库。进行用户学习，以调查埃迪卡斯特和数据科学家的当前实践（即，使用外部搜索引擎）。结果证明了埃迪斯坦特的有效性和有用性，与会者赞赏其对EDA的顺利和环境支持。我们还报告了有关代码推荐工具的几种设计意义。

translated by 谷歌翻译

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Yige Li , Xixiang Lyu , Nodens Koren , Lingjuan Lyu , Bo Li , Xingjun Ma

分类：机器学习 | 人工智能

2021-10-22

后门攻击已成为深度神经网络（DNN）的主要安全威胁。虽然现有的防御方法在检测或擦除后以后展示了有希望的结果，但仍然尚不清楚是否可以设计强大的培训方法，以防止后门触发器首先注入训练的模型。在本文中，我们介绍了\ emph {反后门学习}的概念，旨在培训\ emph {Clean}模型给出了后门中毒数据。我们将整体学习过程框架作为学习\ emph {clean}和\ emph {backdoor}部分的双重任务。从这种观点来看，我们确定了两个后门攻击的固有特征，因为他们的弱点2）后门任务与特定类（后门目标类）相关联。根据这两个弱点，我们提出了一般学习计划，反后门学习（ABL），在培训期间自动防止后门攻击。 ABL引入了标准培训的两级\ EMPH {梯度上升}机制，帮助分离早期训练阶段的后台示例，2）在后续训练阶段中断后门示例和目标类之间的相关性。通过对多个基准数据集的广泛实验，针对10个最先进的攻击，我们经验证明，后卫中毒数据上的ABL培训模型实现了与纯净清洁数据训练的相同性能。代码可用于\ url {https:/github.com/boylyg/abl}。

translated by 谷歌翻译

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

Jiaming Zhang , Xingjun Ma , Qi Yi , Jitao Sang , Yugang Jiang , Yaowei Wang , Changsheng Xu

分类：机器学习

2022-12-31

There is a growing interest in developing unlearnable examples (UEs) against visual privacy leaks on the Internet. UEs are training samples added with invisible but unlearnable noise, which have been found can prevent unauthorized training of machine learning models. UEs typically are generated via a bilevel optimization framework with a surrogate model to remove (minimize) errors from the original samples, and then applied to protect the data against unknown target models. However, existing UE generation methods all rely on an ideal assumption called label-consistency, where the hackers and protectors are assumed to hold the same label for a given sample. In this work, we propose and promote a more practical label-agnostic setting, where the hackers may exploit the protected data quite differently from the protectors. E.g., a m-class unlearnable dataset held by the protector may be exploited by the hacker as a n-class dataset. Existing UE generation methods are rendered ineffective in this challenging setting. To tackle this challenge, we present a novel technique called Unlearnable Clusters (UCs) to generate label-agnostic unlearnable examples with cluster-wise perturbations. Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains. We empirically verify the effectiveness of our proposed approach under a variety of settings with different datasets, target models, and even commercial platforms Microsoft Azure and Baidu PaddlePaddle.

translated by 谷歌翻译

CIM: Constrained Intrinsic Motivation for Sparse-Reward Continuous Control

Xiang Zheng , Xingjun Ma , Cong Wang

分类：机器学习

2022-11-28

Intrinsic motivation is a promising exploration technique for solving reinforcement learning tasks with sparse or absent extrinsic rewards. There exist two technical challenges in implementing intrinsic motivation: 1) how to design a proper intrinsic objective to facilitate efficient exploration; and 2) how to combine the intrinsic objective with the extrinsic objective to help find better solutions. In the current literature, the intrinsic objectives are all designed in a task-agnostic manner and combined with the extrinsic objective via simple addition (or used by itself for reward-free pre-training). In this work, we show that these designs would fail in typical sparse-reward continuous control tasks. To address the problem, we propose Constrained Intrinsic Motivation (CIM) to leverage readily attainable task priors to construct a constrained intrinsic objective, and at the same time, exploit the Lagrangian method to adaptively balance the intrinsic and extrinsic objectives via a simultaneous-maximization framework. We empirically show, on multiple sparse-reward continuous control tasks, that our CIM approach achieves greatly improved performance and sample efficiency over state-of-the-art methods. Moreover, the key techniques of our CIM can also be plugged into existing methods to boost their performances.

translated by 谷歌翻译

Backdoor Attacks on Time Series: A Generative Approach

Yujing Jiang , Xingjun Ma , Sarah Monazam Erfani , James Bailey

分类：机器学习

2022-11-15

Backdoor attacks have emerged as one of the major security threats to deep learning models as they can easily control the model's test-time predictions by pre-injecting a backdoor trigger into the model at training time. While backdoor attacks have been extensively studied on images, few works have investigated the threat of backdoor attacks on time series data. To fill this gap, in this paper we present a novel generative approach for time series backdoor attacks against deep learning based time series classifiers. Backdoor attacks have two main goals: high stealthiness and high attack success rate. We find that, compared to images, it can be more challenging to achieve the two goals on time series. This is because time series have fewer input dimensions and lower degrees of freedom, making it hard to achieve a high attack success rate without compromising stealthiness. Our generative approach addresses this challenge by generating trigger patterns that are as realistic as real-time series patterns while achieving a high attack success rate without causing a significant drop in clean accuracy. We also show that our proposed attack is resistant to potential backdoor defenses. Furthermore, we propose a novel universal generator that can poison any type of time series with a single generator that allows universal attacks without the need to fine-tune the generative model for new time series datasets.

translated by 谷歌翻译

M2-Net: Multi-stages Specular Highlight Detection and Removal in Multi-scenes

Zhaoyangfan Huang , Kun Hu , Xingjun Wang

分类：计算机视觉

2022-07-20

在本文中，我们提出了一个新颖的统一框架，用于突出显示多片的检测和去除，包括合成图像，面部图像，自然图像和文本图像。该框架由三个主要组件组成，突出显示了特征提取器模块，突出显示粗卸下模块和凸显的精炼拆卸模块。首先，高光功能提取器模块可以将突出显示功能和非高光功能与原始突出显示图像直接分开。然后，使用粗大的拆卸网络获得了突出显示的拆卸图像。为了进一步提高突出显示的效果，最终使用基于上下文突出显示注意机制的精制突出显示模块获得了精制的突出显示图像。在多个场景中的广泛实验结果表明，所提出的框架可以获得突出显示的出色视觉效果，并获得最新的结果，从而获得了几种定量评估指标。我们的算法首次在视频重点删除中首次应用，并有令人鼓舞的结果。

translated by 谷歌翻译

Backdoor Attacks on Crowd Counting

Yuhua Sun , Tailai Zhang , Xingjun Ma , Pan Zhou , Jian Lou , Zichuan Xu , Xing Di , Yu Cheng , Lichao

分类：计算机视觉 | 人工智能

2022-07-12

人群计数是一项回归任务，它估计场景图像中的人数，在一系列安全至关重要的应用程序中起着至关重要的作用，例如视频监视，交通监控和流量控制。在本文中，我们研究了基于深度学习的人群计数模型对后门攻击的脆弱性，这是对深度学习的主要安全威胁。后门攻击者通过数据中毒将后门触发植入目标模型，以控制测试时间的预测。与已经开发和测试的大多数现有后门攻击的图像分类模型不同，人群计数模型是输出多维密度图的回归模型，因此需要不同的技术来操纵。在本文中，我们提出了两次新颖的密度操纵后门攻击（DMBA $^{ - } $和DMBA $^{+} $），以攻击模型以产生任意的大或小密度估计。实验结果证明了我们对五个经典人群计数模型和四种类型数据集的DMBA攻击的有效性。我们还深入分析了后门人群计数模型的独特挑战，并揭示了有效攻击的两个关键要素：1）完整而密集的触发器以及2）操纵地面真相计数或密度图。我们的工作可以帮助评估人群计数模型对潜在后门攻击的脆弱性。

translated by 谷歌翻译

On the Convergence and Robustness of Adversarial Training

Yisen Wang , Xingjun Ma , James Bailey , Jinfeng Yi , Bowen Zhou , Quanquan Gu

分类：机器学习

2021-12-15

改善深度神经网络（DNN）对抗对抗示例的鲁棒性是安全深度学习的重要而挑战性问题。跨越现有的防御技术，具有预计梯度体面（PGD）的对抗培训是最有效的。对手训练通过最大化分类丢失，通过最大限度地减少从内在最大化生成的逆势示例的丢失来解决\ excepitient {内部最大化}生成侵略性示例的初始最大优化问题。。因此，衡量内部最大化的衡量标准是如何对对抗性培训至关重要的。在本文中，我们提出了这种标准，即限制优化（FOSC）的一阶静止条件，以定量评估内部最大化中发现的对抗性实例的收敛质量。通过FOSC，我们发现，为了确保更好的稳健性，必须在培训的\ Texit {稍后的阶段}中具有更好的收敛质量的对抗性示例。然而，在早期阶段，高收敛质量的对抗例子不是必需的，甚至可能导致稳健性差。基于这些观察，我们提出了一种\ Texit {动态}培训策略，逐步提高产生的对抗性实例的收敛质量，这显着提高了对抗性培训的鲁棒性。我们的理论和经验结果表明了该方法的有效性。

translated by 谷歌翻译

SpineOne: A One-Stage Detection Framework for Degenerative Discs and Vertebrae

Jiabo He , Wei Liu , Yu Wang , Xingjun Ma , Xian-Sheng Hua

分类：计算机视觉

2021-10-28

脊柱退化困扰着许多长老，办公室工作者，甚至是年轻世代。有效的药剂或外科干预措施可以帮助缓解退行性脊柱条件。然而，传统的诊断程序往往太费力了。临床专家需要从脊柱磁共振成像（MRI）或计算机断层扫描（CT）图像中检测椎间盘和椎骨作为进行病理诊断或术前评价的初步步骤。已经开发了机器学习系统，以帮助这一程序通常在两级方法之后：首先进行解剖定位，然后进行病理分类。为了更高效和准确的诊断，我们提出了一种单阶段检测框架，称为Spineone，同时定位和分类来自MRI切片的退化椎间盘和椎骨。脊柱内置于以下三个关键技术：1）Keypoint Heatmap的新设计，以促进同时关键点本地化和分类; 2）使用注意力模块更好地区分光盘和椎骨之间的表示; 3）一种新颖的梯度引导的客观协会机制，将多个学习目标与后来的培训阶段相关联。脊髓疾病智能诊断的经验结果Tianchi竞争（SDID-TC）550考试的数据集表明，我们的方法通过大幅度超越现有方法。

translated by 谷歌翻译

ECG-ATK-GAN: Robustness against Adversarial Attacks on ECGs using Conditional Generative Adversarial Networks

Khondker Fariha Hossain , Sharif Amit Kamran , Alireza Tavakkoli , Xingjun Ma

分类：人工智能 | 机器学习

2021-10-17

从心电图中自动化心律失常的自动化检测需要一个可靠且值得信赖的系统，该系统在电动扰动下保持高精度。许多机器学习方法在对心电图的心律不齐分类方面已经达到了人类水平的表现。但是，这些体系结构容易受到对抗攻击的影响，这可能会通过降低模型的准确性来误解ECG信号。对抗性攻击是在原始数据中注入的小型制作的扰动，这些扰动表现出信号的过度分发转移，以错误地分类正确的类。因此，滥用这些扰动的虚假住院和保险欺诈引起了安全问题。为了减轻此问题，我们引入了第一个新型的条件生成对抗网络（GAN），可抵抗对抗性攻击的ECG信号，并保持高精度。我们的体系结构集成了一个新的类加权目标函数，用于对抗扰动识别和新的块，用于辨别和组合学习过程中信号中的分布外变化，以准确地对各种心律失常类型进行分类。此外，我们在六种不同的白色和黑色盒子攻击上对架构进行了基准测试，并将它们与最近提出的其他心律失常分类模型进行比较，这是两个公开可用的ECG心律失常数据集。该实验证实，我们的模型对这种对抗性攻击更为强大，以高精度对心律不齐进行分类。

translated by 谷歌翻译